Data creation

Collection, spreadsheet design and digitizing

Aud Halbritter and Joe Chipperfield

Introduction

Data creation is the systematic collection of data for a specific purpose and digitizing the data for downstream processing in statistical analysis or sharing with or reuse by others.

What is the plan?

Time Task
5 min Introduction
15 min Design method
10 min Discuss your decisions
30 min Go outside and collect data
30 min Discussion on data collection and digitizing

Data collection

Data collection methods

  • automatic data collection (e.g. weather station, LiCor)

  • manual data collection (samples, measure, count)

Exercise: design method (15 min)

  • Make groups of 4-5 students.

  • Decide what data you want to collect. (e.g. snow depth, length of icicles, plant height)

  • Decide on a method to collect the data (e.g. paper, phone).

  • Design a spreadsheet/protocol. Think about what is the relevant information that you need.

Reflect on the decisions you made and if you would change anything if you had the means.

Discussion: collect data (10 min)

  • What decisions did you make?
  • How would you improve the data collection if you could?

Exercise: collect your data (30 min)

  • Go outside and collect your data. It is not important to collect as many data points as possible.

  • Reflect if the method you used was suitable for the data you collected.

Discussion: data collection (10 min)

  • How did it go?

  • was the method appropriate?

  • Did you miss any information?

Key things during data collection

  • Logistical issues

  • Calibration of instruments

  • Multiple measurements/observations/samples

  • Template/protocol for sampling (multiple data collectors, over time)

  • Take notes during data collection

  • Collect meta data that could be useful for wider usage

Source: britishecologicalsociety.org/publications/guides-to

Digitizing and process data

Disscusion: digitizing (15 min)

  • What would be your strategy for digitize the data?

  • What could be problems?

Data validation tools

Use data validation tools for data entry.

  • format cells (dates)

  • set ranges

  • drop down menu

Be consistent

  • file names

  • variable names

  • factor levels

  • missing data

  • notes

Careful with dates

Data workflow

  • Digitize data

  • Keep raw data raw

  • No calculations in raw data

  • Code-based data cleaning

  • Clean data

  • Document your data (data about your data)

Literature

  • BES guide for Data management

  • BioStats book Data collection

  • Broman, Karl W, and Kara H Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10.